A Finite-State Approach to Phrase-Based Statistical Machine Translation
نویسنده
چکیده
This paper presents a finite-state approach to phrase-based statistical machine translation where a log-linear modelling framework is implemented by means of an on-the-fly composition of weighted finite-state transducers. Moses, a well-known state-of-the-art system, is used as a machine translation reference in order to validate our results by comparison. Experiments on the TED corpus achieve a similar performance to that yielded by Moses.
منابع مشابه
A phrase-level machine translation approach for disfluency detection using weighted finite state transducers
We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-...
متن کاملPhrase-based finite state models
In the last years, statistical machine translation has already demonstrated its usefulness within a wide variety of translation applications. In this line, phrasebased alignment models have become the reference to follow in order to build competitive systems. Finite state models are always an interesting framework because there are well-known efficient algorithms for their representation and ma...
متن کاملFinite-state-based and phrase-based statistical machine translation
This paper shows the common framework that underlies the translation systems based on phrases or driven by finite state transducers, and summarizes a first comparison between them. In both approaches the translation process is based on pairs of source and target strings of words (segments) related by word alignment. Their main difference comes from the statistical modeling of the translation co...
متن کاملIntegrated Chinese word segmentation in statistical machine translation
A Chinese sentence is represented as a sequence of characters, and words are not separated from each other. In statistical machine translation, the conventional approach is to segment the Chinese character sequence into words during the pre-processing. The training and translation are performed afterwards. However, this method is not optimal for two reasons: 1. The segmentations may be erroneou...
متن کاملPhrasal Segmentation Models for Statistical Machine Translation
Phrasal segmentation models define a mapping from the words of a sentence to sequences of translatable phrases. We discuss the estimation of these models from large quantities of monolingual training text and describe their realization as weighted finite state transducers for incorporation into phrase-based statistical machine translation systems. Results are reported on the NIST Arabic-English...
متن کامل